npj Precision Oncology
○ Springer Science and Business Media LLC
Preprints posted in the last 90 days, ranked by how well they match npj Precision Oncology's content profile, based on 14 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.
Leyva, A.; Akbar, A.; Niazi, K.
Show abstract
Molecular subtyping of cancer is traditionally defined in transcriptomic space, yet routine clinical deployment is limited by the availability and cost of sequencing. Meanwhile, histopathology captures rich morphological information that is known to correlate with molecular state but lacks a principled, mechanistic bridge to gene-level representations. We propose a graph-constrained learning framework that aligns morphology-derived signals with a fixed, data-driven gene network discovered via hierarchical Monte Carlo screening. We can derive new gene sets for classification using random sampling, and use the coexpression network of that graph to enforce the learning of a pure morphology model without using gene expression. The resulting model performs subtype prediction using morphology alone, while being explicitly forced to operate through a gene-structured latent space. Structural alignment is enforced during training. For Moffitt classification in pancreatic cancer using PANCAN and TCGA datasets, the model has a reported 85% AUC using an alternative gene set network structure, while the alternate gene set itself has an 84% AUC in all patients that were classified with subtyping with pancreatic cancer in the dataset. This demonstrates that virtual transcriptomics can provide biologically grounded molecular insights using only routine histopathology slides, potentially expanding access to precision oncology in resource-limited settings.
Shady, M.; Reardon, B.; Jiang, S.; Pimenta, E.; O'Meara, T.; Park, J.; kehl, K. L.; Elmarakeby, H. A.; Sunyaev, S. R.; Van Allen, E. M.
Show abstract
IntroductionPrecision oncology has informed cancer care by enabling the discovery and application of diagnostic, prognostic, and/or predictive molecular biomarkers. However, many patients lack actionable biomarkers or fail to respond to biomarker-directed therapies. Patient similarity approaches can leverage comprehensive tumor profiling and prior clinical experiences from large cohorts for decision support, facilitating broader realization of precision oncology insights. MethodsWe developed a deep learning-based modeling framework using real-world clinicogenomic data from a tertiary cancer center to (i) measure patient similarity based on embedded tumor genomic profiles and (ii) evaluate the association of derived patient subgroups and neighborhoods with shared therapeutic outcomes in breast cancer-specific and histology-agnostic pan-cancer settings. ResultsThe model recovered clinically meaningful patient clusters reflecting both expected and previously unknown therapeutic associations, as well as patient-specific neighborhoods that could inform therapeutic trajectories more often than expected by chance in multiple clinical contexts. Moreover, model utility extended to patients without actionable genomic biomarkers and those with cancer of unknown primary (CUP) diagnoses, where neighborhoods aligned with independently predicted primary cancer type. These neighborhoods could also be examined over time in a continuously learning scenario. ConclusionThis similarity-based modeling framework distilled complex molecular and clinical data into concise, context-specific insights that augment clinician judgment, providing a foundation for a real-time learning, patient-centered decision support model in precision oncology.
Ahmad Zafar, S.; Qin, W.; Chengliang, L.; Khan, A. A.; Nazir, A.; Batool, H.; Khalid, F.; Faisal, M. S.
Show abstract
Homologous recombination deficiency (HRD) confers sensitivity to poly (ADP-ribose) polymerase (PARP) inhibitors and platinum-based chemotherapy, representing a critical biomarker for precision oncology across multiple malignancies. Current HRD assessment relies on next-generation sequencing of genomic scar signatures, but specialized infrastructure requirements, high costs, and prolonged turnaround times limit widespread adoption. These barriers restrict access to HRD testing, particularly in resource-constrained settings where the majority of cancer patients receive care. Pan-cancer HRD prediction has been shown, but robustness across histologies and institutions, leak-safe evaluation, and backbone-dependent generalization remain incompletely characterized. Here we show that IHGAMP (Integrative Histopathology-Genomic Analysis for Molecular Phenotyping), a computational framework using vision transformer foundation models, predicts HRD status from H&E images with an AUROC of 0.766 (95% CI 0.727-0.803) on the TCGA held-out test set using OpenCLIP embeddings, and improves to 0.827 with histopathology-pretrained OpenSlideFM embeddings under the same leak-safe protocol. External evaluation on 927 patients (2,718 whole slide images) from seven independent cohorts demonstrated generalization in adenocarcinoma/serous settings (e.g., CPTAC-LUAD AUROC 0.723) and enabled platinum resistance prediction in PTRC-HGSOC (AUROC 0.673), with attenuation in squamous histologies. Systematic comparison of foundation-model embeddings showed that OpenSlideFM outperformed OpenCLIP internally on TCGA (0.827 vs 0.766 AUROC) and improved external generalization in select cohorts (e.g., CPTAC-LUAD), while performance remained attenuated in squamous histologies; TSS-level embedding norm stability across 710 tissue source sites suggested limited site-driven magnitude shifts. Our findings establish that routine histopathology contains morphology associated with HRD that enables moderate, histology-dependent prediction, supporting a potential screening/triage role to prioritize confirmatory molecular testing where appropriate.
Maitra, C.; Das, V.; Seal, D. B.; De, R. K.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWLung cancer is characterized by profound intratumoral and inter-patient heterogeneity, spanning histological subtypes, molecular landscapes, and the tumor microenvironment. While multi-omics integration is essential for capturing this complexity, leveraging these data to explicitly define survival-associated subpopulations remains a significant challenge. In this study, we developed NeuroMDAVIS-FS, an unsupervised deep learning framework designed to stratify lung cancer patients by survival risk, and identify molecular determinants underlying improved clinical outcomes. Using the CPTAC cohort, we integrated genomic (CNV), transcriptomic (RNA-seq), and proteomic profiles to extract modality-specific features. Candidate biomarkers were validated through Kaplan- Meier (KM) survival analysis and univariate Cox proportional hazards (CoxPH) regression. A final multivariate CoxPH model effectively stratified patients into high-risk and low-risk cohorts (Kaplan Meier p-value < 0.001). Notably, the integration of these molecular features with baseline clinical models significantly enhanced prognostic accuracy, improving the concordance index by 43.79% in LUAD, 31.05% in LSCC, and 23.76% across the pan-lung cancer cohort. These results demonstrate that NeuroMDAVIS-FS identifies robust, biologically relevant features that surpass traditional clinical variables in predicting patient outcomes, offering a scalable path for precision oncology.
Wang, X.; Chen, Y.; Liu, X.; Qiu, C.; Tang, H.; Huang, T.; Guo, S.; Ma, S.; Cai, M.; Sun, Q.; Chang, Z.; Liu, J.; Wang, X.; Li, J.; Qian, W.; Wang, B.; Zhang, B.; Bai, C.; Shi, M.; Zhang, X.; Li, M.; Wang, J.; Wang, B.; Ma, J.; Ai, L.; Yu, S.; Wang, L.; Feng, N.; Liu, X.; Yu, G.
Show abstract
The histological heterogeneity of primary tumours across the pan-cancer spectrum poses a formidable barrier to accurate lymph node metastasis assessment, often causing AI systems to make "overconfident errors" on rare variants that lead to missed diagnoses. To address this, we present UPATHLN, a unified diagnostic platform that synergizes a pathology foundation model-based encoder with a decoupled uncertainty estimation mechanism. We developed and validated the system using a large-scale multicentre dataset of 26,229 lymph nodes from 14 distinct primary origins. In internal validation, UPATHLN achieved an area under the curve (AUC) of 0.986. Crucially, the uncertainty module functioned as a decisive fail-safe: by flagging potential false-negative predictions for mandatory pathologist review, it intercepted all missed diagnoses, securing 100% conditional sensitivity across both the development and independent test cohorts--even for tumours from seven unseen primary origins. Concurrently, this mechanism reduced the review burden on negative lymph nodes by 73.2%. Ultimately, UPATHLN sets a new benchmark for safety-critical AI, demonstrating that explicitly modelling uncertainty is key to unlocking reliable, workload-efficient diagnostics at the pan-cancer scale.
Lehtonen, O.; Nordlund, N.; Kahelin, E.; Bergqvist, L.; Aro, K.; Hautaniemi, S.; Kalliala, I.; Virtanen, A.
Show abstract
Cervical intraepithelial neoplasia grade 2 (CIN2) lesions show variable outcomes, and accurate prediction of regression remains a major clinical challenge. We developed an interpretable machine learning pipeline that integrates quantitative histological, clinical, and human papillomavirus (HPV) -genotyping data to predict lesion regression within one and two years. Using panoptic segmentation of routine hematoxylin and eosin (H&E) -stained biopsies, we extracted human-interpretable morphological and immune cell infiltration related features that capture the key histopathological characteristics of CIN2 and identified features that predicted lesion regression. Further, integrating these features to predictive clinical features achieved higher predictive accuracy than clinical variables alone. These findings demonstrate that quantitative, interpretable analysis of H&E histology of routine diagnostic biopsies contains relevant information that predicts the natural history of CIN2 lesions. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=121 SRC="FIGDIR/small/26344510v1_ufig1.gif" ALT="Figure 1"> View larger version (38K): org.highwire.dtl.DTLVardef@11735f5org.highwire.dtl.DTLVardef@d76e89org.highwire.dtl.DTLVardef@19a1d39org.highwire.dtl.DTLVardef@f48a01_HPS_FORMAT_FIGEXP M_FIG Created in BioRender. Lehtonen, O. (2026) https://BioRender.com/rlnkbkp C_FIG
Xu, S.; Wang, Z.; Wang, H.; Ding, Z.; Zou, Y.; Cao, Y.
Show abstract
Online cancer peer-support communities generate large volumes of patient-authored and caregiver-authored text that may reflect distress, coping, and informational needs. Automated emotional tone classification could support scalable monitoring, but supervised modeling depends on label quality and may benefit from explicit context features. Using the Mental Health Insights: Vulnerable Cancer Survivors & Caregivers dataset, we compared five model families (TF-IDF Logistic Regression, Random Forest, LightGBM, GRU, and fine-tuned ALBERT) on a three-class target (Negative/Neutral/Positive) derived from four original categories. We introduced two extensions: (i) LLM-based annotation to generate parallel "AI labels" and (ii) token-based augmentation that prepends LLM-extracted structured variables (reporter role and cancer type) to the post text. Models were trained with a 60/20/20 stratified train/validation/test split, with hyperparameters selected on validation data only. Test performance was summarized using weighted F1 and macro one-vs-rest AUC with bootstrap confidence intervals, with paired comparisons based on McNemar tests and false discovery rate adjustment. The LLM annotator produced substantial redistribution in the four-class label space, shifting prevalence toward very negative relative to the original labels; the shift persisted but attenuated after collapsing to three classes. Across all model families, token augmen-tation improved held-out performance, with the largest gains for GRU and consistent improvements for ALBERT. Augmentation also reduced polarity-reversing errors (Nega-{leftrightarrow} tive Positive) for ALBERT, while adjacent errors (Negative {leftrightarrow} Neutral) remained the dominant residual failure mode. These results indicate that LLM-based supervision can introduce systematic measurement shifts that require auditing, yet LLM-extracted context incorporated via simple token augmentation provides a pragmatic, model-agnostic mechanism to improve downstream emotional tone classification for supportive oncology decision support. Author summaryWe studied how to better monitor emotional tone in posts from online cancer peer-support communities, where patients and caregivers share experiences that may signal distress, coping, or unmet needs. Automated classification could help organizations and moderators identify when additional support may be needed, but these systems depend on the quality of the labels used for training and may miss clinical context. Using a public dataset of cancer survivor and caregiver posts, we trained and compared several machine-learning and deep-learning models to classify each post as negative, neutral, or positive. We tested two practical improvements. First, we used a large language model to generate an additional set of "AI labels" and examined how these differed from the original categories. Second, we extracted simple context information--whether the writer was a patient or caregiver and what cancer type was mentioned--and added this context to the text before model training. We found that adding context consistently improved performance across model types. However, the AI-generated labels shifted class distributions, indicating that automated labeling can introduce systematic changes that should be audited. Overall, simple context extraction can make emotional tone monitoring more accurate and useful for supportive oncology decision support.
Kilim, O.; Martinez Ruiz, C.; Pipek, O.; Sztupinszki, Z.; Huebner, A.; Diossy, M.; Prosz, A.; Moore, D.; Jamal-Hanjani, M.; Hackshaw, A.; Fillinger, J.; Moldvay, J.; Csabai, I.; Swanton, C.; Szallasi, Z.
Show abstract
The standard treatment for stage I lung adenocarcinoma is surgical resection, in most cases without additional systemic adjuvant treatment. A significant proportion of stage I cases recur with a less than 50% 5-year survival rate. There are clinical data suggesting that adjuvant treatment may improve survival in such recurrent cases. However, previously evaluated predictors such as the IASLC grading system from histological sections and transcriptomic profiles have not been sufficiently accurate and consistent for risk stratification and to guide therapeutic interventions. We hypothesized that these previously investigated diverse diagnostic measurements carry complementary information that may provide higher prognostic power when combined. Here we describe a multimodal deep learning method, PATH-ORACLE. This biomarker is built on top of the prospectively validated transcriptomic-based ORACLE score with the addition of routine histological sections processed by pre-trained foundation models. PATH-ORACLE predicts recurrence with an accuracy of over 85% in two independent cohorts. Given further validation this predictor could be used to prioritize stage IB patients for adjuvant chemotherapy in a more consistent fashion. Furthermore, for stage IA cases, PATH-ORACLE, combined with liquid biopsy-based monitoring may help identify high-risk patients suitable for adjuvant targeted therapy. HighlightsO_LIMultimodal AI model (PATH-ORACLE) integrates histology and transcriptomics to predict stage I LUAD recurrence C_LIO_LIPATH-ORACLE outperforms IASLC grading and transcriptomic or image-based models alone C_LIO_LIModel achieves >85% recurrence prediction accuracy across independent international cohorts C_LIO_LIPATH-ORACLE refines risk stratification within both stage IA and IB lung adenocarcinoma C_LIO_LIBiomarker may guide adjuvant therapy selection and surveillance in early-stage disease C_LI
Hu, Y.; Batchkala, G.; Gaitskell, K.; Domingo, E.; Li, B.; Zhang, T.; Li, Z.; Friedrich, M.; Woodcock, D.; Verrill, C.; Rittscher, J.
Show abstract
Computational-pathology foundation models (PFMs) have demonstrated remarkable accuracy in a wide range of whole-slide image (WSI) analyses, yet their morphological reasoning and potential biases remain opaque. Here we introduce an attention-shift monitoring framework that tracks tissue-level attention influx and efflux before and after fine-tuning a slide-level aggregator. We apply our interpretable framework across five clinically relevant tasks (lymph-node metastasis detection, lung-cancer subtyping, ovarian-cancer drug-response prediction, colorectal-cancer molecular classification and Marsh grading of colitis). We compare two market-validated PFMs, UNI and prov-GigaPath, using dynamically pooled, compressed embeddings under identical running conditions. Although both models achieve comparable ROC-AUC and balanced-accuracy scores, their attention-shift trajectories diverge sharply: each exhibits broad attention efflux from most tissue regions and highly concentrated, yet minimally overlapping, influx into distinct phenotypic zones. The attention heterogeneity in zero-shot mode and inconsistency of post-tuning attention shifts indicate that the presentation of interpretability depends primarily on each models intrinsic feature priors rather than on accuracy or fine-tuning. Our findings uncover a systemic stability gap in PFM interpretability, masked by high performance metrics, and underscore the need for richer explanation tools, bias-monitoring protocols and diversified pre-training strategies to ensure safe clinical deployment.
Salome, P.; Knoll, M.; Walz, D.; Cogno, N.; Dedeoglu, A. S.; Qi, A. L.; Isakoff, S. J.; Abdollahi, A.; Jimenez, R. B.; Bitterman, D. S.; Paganetti, H.; Chamseddine, I.
Show abstract
Introduction: Manual data extraction from unstructured clinical notes is labor-intensive and impractical for large-scale clinical and research operations. Existing automated approaches typically require large language models, dedicated computational infrastructure, and/or task-specific fine-tuning that depends on curated data. The objective of this study is to enable accurate extraction with smaller locally deployed models using a disease-site specific pipeline and prompt configuration that are optimized and reusable. Materials/Methods: We developed OncoRAG, a four-phase pipeline that (1) generates feature-specific search terms via ontology enrichment, (2) constructs a clinical knowledge graph from notes using biomedical named entity recognition, (3) retrieves relevant context using graph-diffusion reranking, and (4) extracts features via structured prompts. We ran OncoRAG using Microsoft Phi-3-medium-instruct (14B parameters), a midsize language model deployed locally via Ollama. The pipeline was applied to three cohorts: triple-negative breast cancer (TNBC; npatients=104, nfeatures=42; primary development), recurrent high-grade glioma (RiCi; npatients=191, nfeatures=19; cross-lingual validation in German), and MIMIC-IV (npatients=100, nfeatures=10; external testing). Downstream task utility was assessed by comparing survival models for 3-year progression-free survival built from automatically extracted versus manually curated features. Results: The pipeline achieved mean F1 scores of 0.80 +/- 0.07 (TNBC; npatients=44, nfeatures=42), 0.79 +/- 0.12 (RiCi; npatients=61, nfeatures=19), and 0.84 +/- 0.06 (MIMIC-IV; npatients=100, nfeatures=10) on test sets under the automatic configuration. Compared to direct LLM prompting and naive RAG baselines, OncoRAG improved the mean F1-score by 0.19 to 0.22 and 0.17 to 0.19, respectively. Manual configuration refinement further improved the F1-score to 0.83 (TNBC) and 0.81 (RiCi), with no change in MIMIC-IV. Extraction time averaged 1.7-1.9 seconds per feature with the 14B model. Substituting a smaller 3.8B model reduced extraction time by 57%, with a decrease in F1-score (0.03-0.10). For TNBC, the extraction time was reduced from approximately two weeks of manual abstraction to under 2.5 hours. In an exploratory survival analysis, models using automatically extracted features showed a comparable C-index to those with manual curation (0.77 vs 0.76; 12 events). Conclusions: OncoRAG, deployed locally using a mid-size language model, achieved accurate feature extraction from multilingual oncology notes without fine-tuning. It was validated against manual extraction for both retrieval accuracy and survival model development. This locally deployable approach, which requires no external data sharing, addresses a critical bottleneck in scalable oncology research.
Teng, X.; Jiang, Y.; Cho, W. C.; Wang, H.; Ma, J.; Zhao, M.; Meng, X.; Xiao, H.; Lai, Q.; Zhang, X.; Xie, H.; Li, T.; Li, Z.; Ren, G.; CHEUNG, A. L.-Y.; Cai, J.
Show abstract
BackgroundEarly and accurate prediction of pathological complete response (pCR) is essential for personalizing neoadjuvant chemotherapy (NACT) in invasive breast cancer. However, most high-performing predictive models rely on costly, multi-modal data that are not routinely available in standard clinical practice. PurposeTo develop and validate Breast Cancer Biological Multi-modal Information Transfer for Response Prediction Model (BC-BioMIXER), a biologically informed predictive model that transfers multi-omics-derived knowledge to routine clinical data, enabling accurate prediction of pathological complete response prior to neoadjuvant chemotherapy initiation. Material and MethodsBC-BioMIXER was developed in a multi-modality cohort of 648 patients with invasive breast cancer (T2-4, any N, M0) incorporating transcriptomic, proteomic, MRI, and clinical data. The model was externally validated in three independent cohorts (total N = 830), including one multi-modality cohort, one clinical trial cohort, and one contemporary real-world cohort. All patients received NACT followed by surgery. The framework employs a teacher-student knowledge-transfer paradigm in which a multi-omics teacher model learns biologically integrated representations that are subsequently transferred to a student model using only routine clinical data. Predictive performance for pCR was benchmarked against a multi-modality reference model and evaluated across cohorts, receptor-defined subgroups (HER2 and hormone receptor [HR]), and treatment groups (NACT with or without immune checkpoint inhibitors [ICI]). Prognostic value was assessed using distant recurrence-free survival (DRFS). The potential to inform immunotherapy decision-making was explored by comparing DRFS between NACT + ICI and NACT-alone groups within model-predicted pCR and non-pCR subgroups. ResultsBC-BioMIXER achieved pCR prediction performance comparable to the multi-modality benchmark (AUC 0.82 vs. 0.85; p = 0.271) and demonstrated consistent discrimination across all validation cohorts (AUCs 0.82, 0.81, and 0.80; all p < 0.001). Patients predicted to achieve pCR experienced significantly improved 3-year DRFS (HR = 0.36; 95% CI, 0.20-0.67; p < 0.001). In patients treated with NACT + ICI, BC-BioMIXER showed numerically superior pCR prediction compared with PD-L1 expression alone (AUC 0.84 vs. 0.72; p = 0.08). Notably, within the model-predicted non-pCR subgroup, patients receiving NACT + ICI had significantly inferior DRFS compared with those receiving NACT alone (HR = 2.70; p = 0.032), whereas no significant difference was observed in the predicted pCR subgroup. ConclusionBC-BioMIXER translates multi-omics-derived biological knowledge into a robust, routine-data-based predictive tool for breast cancer NACT. Its consistent validation across evolving clinical settings and its potential to inform personalized immunotherapy strategies highlight a step toward scalable and accessible precision oncology. HighlightsO_LIBrings multi-omics power to routine clinical practice: Through cross-modality knowledge transfer, BC-BioMIXER leverages transcriptomic and proteomic data during training to enable highly accurate pCR prediction using only standard MRI and clinical variables (AUC 0.82 vs. 0.85 for full multi-modality benchmark, p=0.271). C_LIO_LIConsistently strong and generalizable performance: Validated in three independent cohorts (total N=830), the model maintained robust pCR discrimination (AUC 0.80-0.82, all p<0.001) across receptor subtypes (HR/HER2) and treatment regimens, including with or without immune checkpoint inhibitors. C_LIO_LIGuides personalized immunotherapy de-escalation: In HER2-negative patients predicted as non-pCR, adding ICI to neoadjuvant chemotherapy was associated with significantly worse distant recurrence-free survival (HR 2.70, p=0.032) compared to chemotherapy alone. This effect was not seen in the predicted pCR group, suggesting the model may help identify patients unlikely to benefit from additional immunotherapy. C_LI
Ugwueke, E. C.; Azzam, M.; Zhou, M.; Teply, B. A.; Bergan, R. C.; Wan, S.; Fojo, A. T.; Leuva, H.; Wang, J.
Show abstract
BackgroundOnce the treatment starts, early prediction of treatment benefit and its correlation with overall survival (OS) remains challenging in metastatic castration-resistant prostate cancer (mCRPC). Existing prognostic models require long-term follow-up, limiting their ability to inform timely treatment decisions. To address this gap, we evaluated tumor growth rate (g-rate)-based survival models across multiple treatment lines to assess their ability to predict OS and support early clinical decision-making. MethodsWe developed GxSurv, a Random Survival Forest (RSF)-based framework that incorporates baseline clinical variables and g-rate calculated from serial on-treatment PSA, to construct line-specific prediction models of OS, a direct measure of treatment outcome. Three variants were developed: G3Surv, using the 3-month g-rate; G6Surv, using the 6-month g-rate; and GfSurv, using the final observed g-rate. Model performance was evaluated using Harrells C-index, Unos C-index, Integrated Brier Score (IBS), time-dependent area under the curve (tAUC). Model interpretability was assessed using permutation importance to quantify predictor contributions within the GxSurv framework. FindingsThe study included 15912 treatment records from 11014 patients with mCPRC across four lines of therapy. We found that incorporation of g-rate consistently improved model performance across all treatment lines, with all GxSurv models outperforming Cox proportional hazards (CoxPH). As the earliest prognostic model, our G3Surv demonstrated strong early predictive performance, with Harrells C-index values ranging from 0{middle dot}700 to 0{middle dot}746 and tAUC values of 0{middle dot}766 to 0{middle dot}822 across all lines, representing 5-8% and 4-5% improvements over CoxPH, respectively. These results indicate that G3Surv accurately predicts individual treatment outcomes at 3 months after treatment initiation. Feature importance analyses consistently identified g-rate as a top predictor, followed by baseline PSA and hemoglobin, with relative variation across treatment lines. InterpretationIntegrating g-rate calculated from on-treatment PSA values enables accurate, line-specific prediction of treatment outcomes in mCRPC, with the 3-month g-rate providing robust early prognostic information to support timely, personalized clinical decision-making. FundingU.S. National Science Foundation, National Institutes of Health, American Cancer Society.
Sanjaya, P.; Pitkänen, E.
Show abstract
Tumour typing from whole-genome sequencing is increasingly accurate, yet molecular subtyping from somatic variants remains challenging because of tumour heterogeneity and inconsistent clinical annotations. Here, we present Mutation-Attention Dual-Task (MuAt2), a Transformer model that jointly classifies histological tumour types and subtypes directly from somatic single-nucleotide variants, indels and structural variants. MuAt2 leverages encoders pre-trained on 2,587 pan-cancer whole genomes, and subsequently fine-tuned and evaluated on 14,527 tumour whole genomes from Genomics England spanning 15 tumour types and 68 subtypes. MuAt2 outperformed aggregated-feature deep baselines and conventional machine learning models. Fine-tuning improved both accuracy and calibration across independent cohorts processed with heterogeneous variant-calling pipelines. MuAt2 embeddings organised tumours by lineage and oncogenic processes, captured molecular subtype-defining driver events and improved prognostic stratification in gliomas. Finally, MuAt2 facilitated interpretation of metastatic tumours and cancers of unknown primary by inferring plausible tissue origins from somatic variant patterns. In conclusion, MuAt2 provides a transferable and interpretable modelling framework for cancer diagnosis and prognosis directly from whole-genome somatic variation.
Prelaj, A.; Miskovic, V.; Sacco, M.; Ferrarin, A.; Licciardello, C.; Provenzano, L.; Favali, M.; Lerma, L.; Zec, A.; Spagnoletti, A.; Ganzinelli, M.; Lorenzini, D.; Guirges, B.; Invernizzi, L.; Silvestri, C.; Mazzeo, L.; Meazza Prina, M.; Corrao, G.; Ruggirello, M.; Dumitrascu, A. D.; Di Mauro, R. M.; Monzani, D.; Pravettoni, G.; Zanitti, M.; Macocchi, D.; Marino, M.; Cavalli, C.; Romano, R.; Giani, C.; Armato, S. G.; Esposito, A.; Bestvina, C.; Spector, M.; Bogot, N. R.; Basheer, R.; Hafzadi, A. L.; Roisman, L.; Watermann, I.; Szewczyk, M.; Olchers, T.; Richter, H.; Blanke-Roeser, C.; Sinisca
Show abstract
Despite a decade of immunotherapy, treatment selection in non-small cell lung cancer (NSCLC) still relies on subgroup analyses and clinical scores. I3LUNG (NCT05537922) is currently the largest international, real-world, multimodal, artificial intelligence (AI)-based trial, enrolling 2365 patients. We integrated real-world clinical data (RWD), computed tomography (CT) images, digital pathology (DP), and genomics (G) into machine learning early-fusion (MLEF) and deep-learning intermediate-fusion (DLIF) models. MLEF achieved consistent performance across outcomes (AUC{approx}0.74), with improved results in first-line patients (AUC up to 0.82). Multimodal models outperformed RWD in clinical-specific subgroups (AUCs up to 0.86). In the test set, AI models surpassed PD-L1, ECOG PS, NLR, LDH (all with p<0.01) and the LIPI score. The clinical usability study showed that expert and non-expert physicians could improve their prediction with the explainable AI (XAI) tool. The I3LUNG tool emerges as a clinically relevant decision-support system and is currently under prospective validation in >2,000 patients.
Guan, S.; Jian, Y.; Dong, W.; Dong, L.
Show abstract
BackgroundNeoadjuvant chemotherapy (NAC) is the standard of care for locally advanced breast cancer. However, the disconnect between efficacy in randomized trials and effectiveness in real-world practice--attributable to real-world treatment delays and adherence barriers--remains underexplored for early-stage (cT1-cT3) operable disease. MethodsWe applied the Target Trial Emulation (TTE) framework to a propensity-score matched cohort from the SEER database. To mitigate immortal time bias and staging migration, we reconstructed clinical baselines. Individualized Treatment Effects (ITE) were estimated using a Double-Robust Causal Forest algorithm. To rigorously cross-validate these estimates against model misspecification, we employed a DeepCox neural network as a non-linear sensitivity analysis tool, exposing complex risk structures (e.g., U-shaped hazards) that traditional linear assumptions might overlook. ResultsIn the matched cohort (N=26,946), Standard NAC was associated with an operational survival deficit (Absolute Risk Difference: 3.6%) compared to upfront surgery, corresponding to a hazard ratio of 1.32 (95% CI, 1.24-1.40; p < 0.001). Causal Forest analysis revealed a critical "Response-Survival Discordance": while young TNBC patients exhibited high nodal pathologic complete response (npCR) rates, they paradoxically faced the worst survival outcomes (Standard Cox HR 1.87). Even in the 6-month landmark analysis to account for immortal time bias, this survival detriment persisted (Landmark HR 1.39; 95% CI, 1.06-1.81; p = 0.016; Figure 3D). Crucially, node-positive (cN+) patients--traditionally considered ideal candidates for systemic downstaging--experienced a significant survival detriment with NAC (HR 1.39). This disadvantage was most pronounced in Luminal A subtype and Invasive Lobular Carcinoma (ILC), where NAC failed to provide effective source control. In contrast, HER2-positive status exhibited a trend towards survival benefit, diverging from the significant risks observed in other subtypes. Anatomically, while cT2 tumors identified a "window of minimal operational deficit" where the absolute risk difference was negligible, operational risk paradoxically resurged in cT3 tumors, challenging the conventional paradigm that larger burdens inherently mandate downstaging. O_FIG O_LINKSMALLFIG WIDTH=199 HEIGHT=200 SRC="FIGDIR/small/25342768v1_fig3.gif" ALT="Figure 3"> View larger version (24K): org.highwire.dtl.DTLVardef@11c1cdforg.highwire.dtl.DTLVardef@aba647org.highwire.dtl.DTLVardef@131bf42org.highwire.dtl.DTLVardef@103bc02_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure 3:C_FLOATNO The Heterogeneity Landscape of Treatment Effects.(A) Individualized Treatment Effect (ITE) Waterfall Plot Visualizes the distribution of treatment effects across the cohort. The prominent red area highlights that a significant proportion of patients incur a survival detriment from NAC, contradicting the "one-size-fits-all" assumption. (B) ITE by Molecular Subtype: Boxplots confirm biological heterogeneity; Luminal A and ILC subtypes show the deepest survival penalties, while TNBC exhibits high variance. (C) SHAP Summary Plot: AI-driven interpretability identifies Nodal Stage and Age as the top predictors influencing treatment efficacy. (D) Subgroup Analysis Forest Plot: Forest plot of Hazard Ratios (derived from 6-month landmark models) across key subgroups, confirming the significant survival disadvantage in Young TNBC (p = 0.016) and Node-Positive (p< 0.001) patients. C_FIG ConclusionOur causal analysis reveals a critical disconnect between biological risk and therapeutic efficacy. While SHAP modeling identified node-positive (cN+) status as a high-priority indicator for systemic therapy, the low real-world response rate (npCR 15.0%) rendered historical standard NAC regimens insufficient to counterbalance the risks of surgical delay (HR 1.39). Our findings indicate that without therapeutic escalation (e.g., immunotherapy) to ensure high pathologic response rates, the operational risks of deferring surgery may outweigh the benefits of downstaging in this subgroup. Our findings highlight a critical "Implementation Gap" where standard NAC regimens yield suboptimal real-world outcomes for high-risk subgroups. Our findings suggest that clinical prioritization should diverge based on subtype biology: for chemo-refractory subtypes (e.g., Luminal A, ILC), Upfront Surgery ensures immediate source control and should be prioritized; conversely, for high-risk TNBC, standard NAC is insufficient, warranting Therapeutic Escalation (e.g., immunotherapy) to minimize the risk of non-response.
Cobo, M.; Serrano, D.; Barranco, J.; Pasquier, A.; de-Torres, J. P.; Zulueta, J. J.; Echeveste, J. I.; Ezponda, A.; Argueta, A.; Sanz-Ortega, J.; Berto, J.; Alcaide, A. B.; di Frisco, M.; Felgueroso, C.; Campo, A.; de la Fuente, A. A.; Escobar, A.; Valencia, K.; Orive, D.; Ocon, M. d. M.; Globacka, H. B.; Fortuno, M. A.; Perna, V.; Rodriguez, M.; Lozano, M. D.; Calvo, A.; Pio, R.; Hung, R. J.; Seijo, L. M.; Silva, W.; Bastarrika, G.; Lloret Iglesias, L.; Montuenga, L. M.
Show abstract
IntroductionLow-dose computed tomography (LDCT) lung cancer screening has significantly enhanced early detection and patient survival rates in the population at risk. Current screening methods, that primarily rely on LDCT imaging, will very likely benefit from molecular biomarkers to achieve a more comprehensive, accurate, personalized and non-invasive risk assessment leveraging multimodal tools. We present a novel open access multimodal (imaging, proteomics and demographic) dataset designed to provide an available research resource on LDCT-based early lung cancer detection. The dataset includes annotated screening LDCT scans and plasma proteomics generated by proximity extension assay (Olink) platform. MethodsThe dataset integrates data from control screened individuals without nodules or with benign nodules, and LDCT-diagnosed lung cancer individuals, matched by sex, age and time between image and sample collection. Both radiological and molecular signatures were collected within a six month window, providing detailed insights into disease progression. Nodules were considered as lung cancer cases if biopsy-confirmed lung cancer was diagnosed within 5 years after imaging, enabling the study of longitudinal biomarker evolution and its correlation with imaging findings. To complement the dataset, clinical and demographic data are also available in open access, providing a detailed overview of patient characteristics. The informed consent signed by the participants allows for unrestricted open access for requests directy or indirectly related to lung cancer research. ResultsThe dataset consists of annotated screening LDCT scans and plasma proteomics data measured with most of the Olink Target 96 platforms (1078 individual proteins across 12 panels focused on a specific area of disease or biology) for a total of 211 screening participants. There are 67 lung cancer patients, 68 matched controls with benign pulmonary nodules, 71 matched controls without nodules and 5 surgically excised false positive lesions. Experiments were performed to assess the technical quality and provide a proof-of-concept of usability of the dataset, showing the alignment with findings from previous published studies. ConclusionThis comprehensive dataset aims to facilitate research towards the development of personalized multimodal artificial intelligence models. We also aim to support the investigation of the relationship between imaging and molecular data, paving the way for more accurate understanding of early lung cancer biology. Finally, our open access dataset may help to develop or validate individualized risk prediction models that could significantly advance early lung cancer detection and intervention strategies.
Flanagan, K. C.; Earls, J.; Hiken, J.; Wellinghoff, R. L.; Ponder, M.; Pemberton, K.; Macdonald, O. K.; Welaya, K.; Pippas, A. W.; D'Silva, K.; Sui, X.; Alexander, W.; Slim, J.; Saccaro, S.; Shenkenberg, T.; Bailey, S. D.; Sonnier, S. A.; Azzi, G.; Bank, B.; Kossman, S. E.; Gonzales, P.; Wade, J. L.; Hellyer, J. A.; McLeod, H. L.; Duncavage, E. J.; Glasscock, J. I.
Show abstract
Lung cancer remains the leading cause of cancer mortality worldwide. Immune checkpoint inhibitors targeting PD-1/PD-L1 have significantly improved outcomes in a subset of patients. OncoPrism, a clinical test employing a multidimensional predictive RNA-based immune biomarker, was evaluated for predicting immune checkpoint inhibitor (ICI) benefit in non-small cell lung cancer (NSCLC) patients. This study included data from 1,487 patients and evaluated OncoPrism across four NSCLC cohorts: one PD-L1 inhibitor cohort (n=195), one PD-1 inhibitor cohort (n=89), and two non-ICI cohorts (n=193 and n=1,010, respectively). In the PD-L1 inhibitor cohort, OncoPrism predicted progression-free survival (p<0.0001) and overall survival (p=0.043). In the PD-1 inhibitor cohort, an observational clinical trial, PREDAPT (NCT04510129) enrolling patients from 17 healthcare systems, OncoPrism predicted overall response rate (p=0.008), progression-free survival (p=0.004), and overall survival (p=0.011). PD-L1 Tumor Proportion Score (TPS) was not predictive of response, progression-free survival, or overall survival. OncoPrism did not predict overall survival across two non-ICI NSCLC cohorts (p=0.54, p=0.73), suggesting the test is specifically predictive of ICI benefit rather than being prognostic with more limited clinical utility. Overall, the data show OncoPrism high patients are likely to benefit from a two to three-fold increase in overall response rate, progression-free survival, and overall survival compared to those in other OncoPrism groups. These results underscore the impact of OncoPrism to address the current unmet need for ICI response prediction in NSCLC.
Gallifant, J.; Chen, S.; Shin, K.-Y.; Kellogg, K. C.; Doyle, P. F.; Guo, J.; Ye, B.; Warrington, A.; Zhai, B. K.; Hadfield, M. J.; Gusev, A.; Ricciuti, B.; Christiani, D. C.; Aerts, H. J.; Kann, B. H.; Mak, R. H.; Nelson, T. L.; Nguyen, P.; Schoenfeld, J. D.; Topaloglu, U.; Catalano, P.; Hochheiser, H. H.; Warner, J. L.; Sharon, E.; Kozono, D. E.; Savova, G. K.; Bitterman, D.
Show abstract
Immune-related adverse events (irAEs) affect up to 40% of patients receiving immune checkpoint inhibitors, yet their identification depends on laborious and inconsistent manual chart review. Here we developed and evaluated an agentic large language model system to extract the presence, temporality, severity grade, attribution, and certainty of six irAE types from clinical notes. Retrospectively (263 notes), the system achieved macro-averaged F1 of 0.92 for detection and 0.66 for multi-class severity grading; self-consistency improved F1 by 0.14. The best-performing configuration cost approximately $0.02 per note. In prospective silent deployment over three months (884 notes), detection F1 was 0.72-0.79. In a randomized crossover study of clinical trial staff (17 participants, 316 observations), agentic assistance reduced annotation time by 40% (P < 0.001), increased complete-match accuracy (OR 1.45; 95% CI 1.01-2.09; P = 0.045), and improved inter-annotator agreement (Krippendorffs from 0.22-0.51 to 0.82-0.85). These results demonstrate that agentic AI coupled with human verification could enhance efficiency, performance, and consistency for irAE assessment.
Vellanki, S.; Feiszt, P.; Kenny, P. A.
Show abstract
Standard pathology workup sometimes fails to definitively identify tumor tissue-of-origin in cancers with ambiguous diagnoses or unknown primary sites, complicating treatment decisions. Molecular assays can aid diagnosis but require additional tissue and increase healthcare costs. Intending to leverage routinely collected somatic mutation profiles from comprehensive genomic profiling, we developed Tumor-Origin.com, a machine learning platform to predict tumor tissue-of-origin from mutation data alone. We trained five classifiers on 10,945 tumor mutation profiles from the MSK-IMPACT cohort and validated performance on an independent set of 770 tumors from the Gundersen Precision Oncology cohort spanning 52 cancer types. Performance was strongest for the most common tumor types, reflecting their relative over-representation in training data. Among cancer types with more than five cases, the Logistic Regression classifier achieved the highest average top-3 accuracy of 49%, followed by the Support Vector Machine at 43%. At least one algorithm delivered [≥]40% accuracy in 23 cancer types. Our integrated platform thus provides robust tumor origin predictions across diverse cancers. We have implemented a web-based tool (https://tumor-origin.com) to assist clinicians and researchers in refining diagnoses of cancers of unknown primary without requiring additional tissue or costly testing.
Christopoulos, P.; Blasi, M.; Langer, S.; Shi, S.; Cvetkovic, J.; Bozorgmehr, F.; Allgaeuer, M.; Yuskaeva, K.; Schneider, M.; Shah, R.; Kuon, J.; Stenzinger, A.; Glueck, T.; Thomas, M.
Show abstract
BackgroundOlder age and comorbidities complicate initial therapy in non-small-cell lung cancer (NSCLC), as platinum ineligibility has not been systematically characterized. MethodsAll 2592 patients presenting with metastatic NSCLC between 2018-2023 at Thoraxklinik Heidelberg were analyzed. ECOG status (PS), comorbidities, molecular testing, therapy, toxicities, and outcomes were verified from individual patient records. ResultsAmong 1306 patients with PD-L1 0-49%, systemic therapy was initiated in 74%. With availability of monoimmunotherapy, the treatment rate for patients with PD-L1[≥]50% (n=507) was higher by 5% (p=0.01), while best supportive care (BSC) by own choice was reduced (1.8% vs. 4.5%, p=0.005) more than medical BSC (mBSC 14.6% vs. 17.8%, p=0.11), and early death remained unchanged (ca. 4%). Initial suitability for systemic therapy was documented for 70% of cases eventually receiving mBSC after deterioration associated with comorbidities, metastatic burden, longer workup duration, or radiotherapy upfront (all p<0.001). The atezolizumab Summary of Medicinal Product Characteristics (SmPC) criteria, i.e. >80 years, or PS [≥]3, or comorbidities with PS [≥]2 or with age [≥]70, were fulfilled by 38% of patients (n=501) and associated with a >3-fold higher risk of BSC or early death (230/501), as well as significantly higher toxicity under platinum and shorter survival, which for a platinum dose ratio [≤]60% across 4 cycles (9% of 1306) was similar to that with single-agent chemotherapy (median 5.1 months, p<0.001). SmPC criteria correlated better than comorbidity scores with foregoing platinum, but predictive performance for individual patients remained modest (AUC 0.71, p<0.001). ConclusionsThe high initial attrition of approximately 25% in NSCLC could improve with availability of monoimmunotherapy, but requires optimized, faster patient workflows for better mitigation. Adoption of the SmPC criteria could support a priori identification of patients at risk for mBSC or platinum overtreatment to enhance utilization of monoimmunotherapy and other novel platinum-free first-line options in the future. HighlightsO_LIA high initial attrition of approximately 25% is caused by deterioration after histologic diagnosis in advanced NSCLC. C_LIO_LIMonoimmunotherapy and optimized workflows may facilitate treatment for ca. 15% additional stage IV NSCLC patients. C_LIO_LISmPC criteria indicate cases at higher risk for BSC (>3x) or platinum overtreatment (i.e. platinum dose ratio [≤]60%). C_LIO_LISmPC patients receiving platinum have higher toxicity and shorter survival than non-SmPC patients. C_LIO_LIImproved therapeutic allocation will be essential for utilization of any novel platinum-free option in the future. C_LI